Microbial Genomics — Latest Matching Preprints

1

Integrated Machine Learning-PanGWAS Reveals Chromosome-Encoded Persistence Networks and Plasmid Plasticity in Recurrent Urinary Tract Infection in Escherichia coli

Rajendran, S.; Nagarajan, S.; MOHAN S., S.

2026-05-22 infectious diseases 10.64898/2026.05.20.26353739 medRxiv

Top 0.1%

41.6%

Show abstract

Background: Recurrent urinary tract infections(rUTI) represent a major clinical challenge due to persistent clinical symptoms, repeated antibiotic exposure, and increased risk of multidrug resistance. Further clinical management of rUTI remains challenging, as existing diagnostic and treatment guidelines are largely designed for uncomplicated, acute infections. Though uropathogenic Escherichia coli (UPEC) is the predominant cause of community-acquired UTIs, pathogen-derived genomic features that may predispose certain E. coli strains to repeatedly establish infection are not fully understood. Methods: To comprehensively dissect distinct genetic signals across genomic compartments that distinguish rUTI-associated isolates from those causing sporadic infection, the pan-genome analysis in three different frameworks (i) Combined genomes (chromosome + plasmid), (ii) bacterial chromosomes only and (iii) plasmid-only was conducted. A comprehensive evaluation of population structure was performed using Gubbins, recombination-aware phylogeny IQTree, phylogroup distribution, pan-genome openness using Heaps law, and plasmidome architecture using MOBSUITE. Findings: Supervised machine learning models showed that the highest discriminatory performance was achieved using the combined genomic dataset (accuracy ~0.98), and integration of feature-selected genes with PanGWAS (Pyseer and Scoary) identified a robust set of recurrence-associated genes, namely cbtA, cbeA, and ldrD, which were consistently detected across machine learning and association frameworks. Subsequent association rule mining further revealed cooperative gene networks enriched in rUTI isolates, particularly involving toxin-antitoxin modules and metabolic regulators. Interpretation: Overall, this integrated ML-PanGWAS approach demonstrates that rUTI is a lineage-independent, polygenic phenotype encoded within a combined chromosomal-plasmid genomic context, providing new insights into the bacterial genomic architecture underlying recurrent disease and offering candidate biomarkers for future diagnostic and therapeutic development.

2

Assessment of Oxford Nanopore whole genome sequencing for large-scale genomic characterisation of Staphylococcus aureus

Haugan, I.; Flatby, H. M.; Lysvand, H.; Skei, N. V.; Zaragkoulias, K.; Solligard, E.; Ronning, T. G.; Olsen, L. C.; Damas, J. K.; Afset, J. E.; As, C. G.

2026-04-01 genomics 10.64898/2026.03.30.715209 medRxiv

Top 0.1%

40.7%

Show abstract

Whole-genome sequencing (WGS) is increasingly being utilised in microbial diagnostics, surveillance, and research. In this paper we assess the performance of one leading long-read sequencing technology, Oxford Nanopore Technology (ONT), on 836 Staphylococcus aureus bacteraemia isolates. We compare the results to that of a leading short-read sequencing technology, Illumina. All isolates were sequenced using ONT MinION Mk1B and Illumina HiSeq or MiSeq. Libraries were prepared according to manufacturers instructions. Preprocessing and downstream bioinformatic analyses were performed using a combination of in-house pipelines and publicly available software tools. The average base substitution error rate in ONT assemblies was low but varied between sequence types, possibly due to lineage-specific methylation patterns. Multi locus sequence typing was similar between the technologies, while ONT assemblies allowed for better spa typing than Illumina assemblies. The reported detection rate was similar between ONT and Illumina assemblies for most virulence- and AMR-associated genes and variants. For 42 (22.2%) of 189 genes/variants, the two technologies disagreed in gene detection in 5 isolates or more, and in 39 (20.6.%) of these the highest detection rate was found with ONT. Discrepancies were mainly associated with low GC content, multiple repetitive segments, and small plasmids. Polishing of ONT data resulted in minor changes in gene/variant calling. Our study supports the use of ONT WGS for bacterial population genomic studies on a large collection of S. aureus isolates. While assembly of ONT reads may be affected by its own methodological limitations, it was superior to Illumina assemblies in detection of potentially clinically relevant genes and variants at a low read error rate. Understanding the advantages and limitations of WGS technologies is essential before undertaking studies involving such methods on large sets of bacteria. Author summaryIn this paper, we present a practical assessment of one important whole genome sequencing (WGS) method, Oxford Nanopore Technology (ONT), and compare its performance in bacterial population genomics to that of WGS with Illumina technology. Our goal was to investigate the usefulness of ONT in studies aiming to identify clinically relevant bacterial characteristics in large collections of bacteria, such as genotype-phenotype studies. We sequenced a large set of clinical S. aureus isolates from episodes of bloodstream infections using both ONT and Illumina technologies and performed analyses with widely used software and bioinformatic pipelines. We have elucidated inherent strengths and limitations of ONT and Illumina sequencing and report some of the practical consequences of these on bacterial typing and detection of clinically relevant genes. With this study, we present one of the most comprehensive assessments of long-read sequencing technology for the genomic characterisation of clinical bacterial isolates, and the findings provide guidance for researchers considering WGS in large-scale bacterial genomics.

3

Genomic Characterization of the RyC collection: 50 Multidrug Resistant Clinical Isolates of Escherichia coli and Klebsiella spp.

Rodera-Fernandez, P.; Sastre-Dominguez, J.; Costas, C.; Alonso-del-Valle, A.; de la Fuente, J.; Hernandez-Garcia, M.; Canton, R.; Santos-Lopez, A.; San Millan, A.

2026-05-18 microbiology 10.64898/2026.05.18.725816 medRxiv

Top 0.1%

37.6%

Show abstract

Antimicrobial resistance (AMR) is a major global public health threat, and Enterobacterales producing extended-spectrum {beta}-lactamases (ESBLs) represent some of the most common and concerning pathogens in clinical settings. Importantly, the dissemination of these resistance mechanisms is largely driven by mobile genetic elements (MGEs), particularly plasmids. Advancing our understanding of AMR evolution through experimentation requires moving beyond domesticated laboratory strains and towards clinically relevant isolates. However, despite the abundance of genomic data in public repositories, there is a lack of well-characterised clinical collections available for experimental work. Here, we characterise the RyC collection, which includes 50 multidrug-resistant, ESBL-producing Escherichia coli and Klebsiella spp. strains isolated from the gut microbiota of hospitalised patients at Hospital Universitario Ramon y Cajal (Madrid, Spain). We generated high-quality genome assemblies for all strains using a combination of short- and long-read sequencing technologies. From these data, we performed a comprehensive characterisation of the pangenome, mobilome, resistome and defensome of the collection. We present the RyC collection as a robust and experimentally tractable resource to study AMR evolution and MGEs dynamics in clinically relevant bacterial backgrounds. Impact statementAntimicrobial resistance (AMR) is a growing global health threat driven by the rapid dissemination of resistance genes among clinically relevant bacteria. A major challenge in studying AMR evolution is the reliance on domesticated laboratory strains, which poorly represent the complexity of pathogens circulating in hospitals. Here, we introduce the RyC collection, a set of well-characterised, multidrug-resistant Enterobacterales isolates obtained from hospitalised patients. By combining high-quality genome sequencing with detailed analyses of their gene content and mobile genetic elements (MGEs), this collection provides a realistic and experimentally tractable system to study how resistance evolves and spreads. The RyC collection will facilitate research on AMR dynamics, plasmid biology and host-MGEs interactions, ultimately contributing to the development of more effective strategies to combat antibiotic-resistant infections.

4

Epidemiology of Legionella: Genome-bAsed Typing (el_gato) - a new bioinformatic tool for identifying sequence-based types of Legionella pneumophila from whole genome sequencing data

Collins, A. J.; Mashruwala, D.; Chivukula, V.; Kozak-Muiznieks, N. A.; Rishishwar, L.; Norris, E. T.; Willby, M. J.; Hamlin, J.; Overholt, W. A.

2026-03-23 bioinformatics 10.64898/2026.03.20.713011 medRxiv

Top 0.1%

37.5%

Show abstract

Sequence-based typing (SBT) via Sanger sequencing has been the standard for describing Legionella pneumophila relationships for two decades. SBT involves sequencing seven loci, identifying alleles using the United Kingdom Health Security Agency (UKHSA) database, and inferring the corresponding sequence type (ST). While similar SBT approaches for other organisms can be easily adapted to whole genome sequencing (WGS), L. pneumophila presents two known challenges for this adaptation: multiple copies of one locus (mompS) and extensive heterogeneity in a second locus (neuA/neuAh). Although several computational methods have been proposed to address these issues, a WGS-based replacement with equal resolution to traditional SBT has been elusive. To address this gap, we developed el_gato (Epidemiology of Legionella: Genome-bAsed Typing; https://github.com/CDCgov/el_gato), which offers several advantages over existing methods: (1) a novel approach for resolving multiple mompS alleles identified in the same isolate, (2) the ability to capture diverse neuA/neuAh alleles, (3) fast runtime with an average of 27.7 seconds per sample, (4) easy installation via Bioconda or Docker and (5) an updated database as of March 2025. el_gato works with either paired-end short reads or genome assemblies, performing more accurately with paired-end short reads at least 250 base pairs (bp) in length. We compared el_gato against two other in silico SBT tools ("mompS", hereafter referred to as mompS tool and "legsta") using a dataset of 441 isolates with sequence types (STs) previously determined by Sanger-based sequencing. el_gato correctly identified the ST for 98.9% of the test isolates, compared to 95.2% for the mompS tool and 42.2% for legsta, demonstrating a significant improvement compared to the mompS tool (adjusted p = 1.06e-3) and legsta (adjusted p = 4.24e-55) in ST identification. Furthermore, el_gatos determination of ST was not significantly different from Sanger sequencing (adjusted p = 0.442). In summary, el_gato significantly improves in silico SBT and given its growing adoption, is poised to support the public health community.

5

Salmonella Genomic Markers for Risk to Food Safety

Waters, E. V.; Hill, C.; Orzechowska, B.; Cook, R.; Jorgensen, F.; Chattaway, M. A.; Langridge, G. C.

2026-03-30 genomics 10.64898/2026.03.27.714810 medRxiv

Top 0.1%

33.6%

Show abstract

Foodborne non-typhoidal Salmonella remains a major public health concern, yet routine surveillance recovers large numbers of isolates from food that are not associated with human illness. Studies have shown foodborne isolates can be genetically linked to clinical cases, highlighting a critical challenge for risk assessment and outbreak prioritisation. This study aimed to determine whether genomic markers can distinguish foodborne Salmonella strains with an increased likelihood of causing infection. Whole-genome sequencing data from over 900 Salmonella isolates recovered from food and the environment through UK Health Security Agency surveillance were analysed using hierarchical clustering to define genetically related groups. These clusters were expanded using the global EnteroBase database to provide broader epidemiological context. Genome-wide association analyses identified genetic markers associated with clusters containing clinical isolates, including phage-associated regions. A highly conserved 7 kb marker identified in S. Agona demonstrated strong predictive performance at a global scale, with high sensitivity and specificity for infection-associated lineages and strict serovar restriction. Comparative genomic analysis revealed that all markers localised to a shared chromosomal hotspot corresponding to a prophage integration site. The 7 kb risk-associated marker formed part of a larger prophage closely related to the well-characterised S. Typhimurium Fels-2 phage, which encodes a DNA invertase linked to phase variation, a mechanism known to promote phenotypic heterogeneity and host adaptation. As these S. Agona isolates are monophasic, our findings indicate that our genome-wide association approach has rediscovered this DNA invertase known to contribute to infection risk but in a different serovar via an alternative regulatory mechanism. Overall, this work demonstrates the potential to move beyond treating all foodborne Salmonella isolates as equivalent hazards, towards a genomics-informed framework for risk stratification. This approach provides a foundation for improved risk-based decision-making, enhance outbreak investigations and enable earlier prioritisation of public health responses during Salmonella surveillance and control. Author summaryFoodborne Salmonella infections remain a major public health concern, but not all strains pose the same risk to human health. Here we investigated whether genetic differences could explain why some foodborne strains are more likely to cause human infection. We analysed over 900 genomes from food and environmental sources, grouping closely related strains before placing them in a global context using EnteroBase. By combining pangenome and genome-wide association analyses, we identified distinct lineages within several serovars that differed in their association with human cases. In Salmonella Agona, all clinical isolates belonged to a single lineage carrying a highly conserved 7 kb marker that was absent from low-risk strains. This marker demonstrated strong sensitivity and specificity across global datasets and was located within a prophage closely related to the well-characterised Fels-2 phage. This region encodes a DNA invertase previously linked to phase variation, a mechanism that promotes bacterial adaptability. Our findings indicate that infection risk can be structured at the lineage level and influenced by mobile genomic elements, particularly prophages, that enhance environmental persistence and host adaptation. This work advances genomic surveillance from retrospective linkage towards mechanistic and predictive risk assessment, with direct relevance for supporting risk-based decision-making during outbreak investigations.

6

Population analysis and host-disease associations of Shiga toxin-producing Escherichia coli from various sources across eleven European countries using whole genome sequencing

Tozzoli, R.; Schadron, T.; Knijn, A.; De Sabato, L.; Morabito, S.; Montalbano Di Filippo, M.; Fiskebeck, E.; Johannessen, G.; Antony-Samy, J. K.; Good, L.; Soderlund, R.; van Hoek, A.; Mughini Gras, L.; Franz, E.; Wieczorek, K.; Scavia, G.; Moro, O.; Chiani, P.; Michelacci, V.; Burgess, C. M.; Duffy, G.; Rodgers, J.; Kirchner, M.; Pista, A.; Silveira, L.; Amaro, A.; Clemente, L.; Chattaway, M. A.; Jenkins, C.; Dallman, T.; Schjorring, S.; Scheutz, F.; Byrne, B.; Gutierrez, M.; Lopez-Chavarrias, V.; Ugarte-Ruiz, M.; Brandal, L.; Naseer, U.; Kolackova, I.; Zomer, A. L.; Wagenaar, J. A.; Pires, S

2026-04-28 genomics 10.64898/2026.04.27.721056 medRxiv

Top 0.1%

32.9%

Show abstract

Shiga toxin-producing Escherichia coli (STEC) are important foodborne pathogens, able to cause severe disease in humans. In the DiSCoVeR project (https://onehealthejp.eu/jrp-discover/) a STEC inventory from human and non-human sources from 11 European countries was set up and [≥] 3500 strains were sequenced to perform comparative genomics analysis. We used this dataset to assess STEC population structure and to investigate potential associations between genomic features, host reservoirs and symptoms. Most STEC isolates analysed by Whole Genome Sequencing (WGS) in this study were collected between years 2010-2020. An ad hoc pipeline was deployed for a harmonised characterization of the STEC in the database, allowing the determination of serotyping, stx gene subtyping, 7-loci MLST, virulotyping and cgMLST. The results were analysed with Principal Component Analysis (PCoA) in relation with isolation source to assess clustering of STEC subpopulations. When human STEC data were analysed, the PCoA revealed three distinct human STEC subpopulations (STEC_1, STEC_2 and STEC_3), which were further analysed for associations between genomic features, symptoms and variance. The non-human STEC showed a more dispersed distribution, except for one subpopulation with genes linked to specific host species, and some virulence profiles overlapping with the STEC_1 population. In conclusion, our analysis identified distinct STEC subpopulations from human cases, each characterized by specific genetic features and associated with varying proportions of severe disease outcomes. These findings provide novel insights supporting the risk assessment of STEC. Impact statement[This lay summary of your article should be no more than 200 words, and should a) provide a perspective of how this article adds to the literature in the field; b) identify breadth of interest/utility; and c) state the significance of output (incremental or step), in terms of relevance.] This study is based on the establishment of a One Health STEC genomes database, including sequences from isolates of different sources. Most of the isolates had been isolated in the ten-years time span 2010-2020, in 11 different countries, for surveillance and monitoring activities or specific surveys and research purposes. The final dataset included the whole genome sequencing of 3,418 STEC isolates, mainly from human cases of infections. The metadata included the host symptoms, where available, for human STEC strains and the animal source the strains had been isolated from. We set up a pipeline for the harmonized analysis of STEC WGS, called Discover, made available though ARIES webserver or GitHub. The analysis allowed a deep characterization of STEC strains circulating in Europe. We used this resource to assess STEC population structure and to investigate potential associations between genomic features, host reservoirs, and various symptoms associated with STEC infection by PCoA. This analysis highlighted the presence of subpopulation of human STEC associated with specific features. We provide new information useful for risk characterization, as well as a large dataset genome database and associated metadata compiled from STEC strains, representing a valuable resource for the scientific community, enabling further investigations into STEC diversity, evolution, source attribution and public health relevance. Data summaryThe authors confirm all supporting data, including sequence data accession numbers, code and protocols have been provided within the article or through supplementary data files. One supplementary method and five supplementary tables are available with the online version of this article

7

Mobile element-mediated carbapenem resistance in Enterobacter hormaechei in a Nigerian intensive care unit

Mba, I. E.; Odih, E. E.; Adekanmbi, O.; Oaikhena, A. O.; Sunmonu, G. T.; Adebiyi, I.; Gbaja, A. T.; Animashaun, O.; Osadebamwen, P.; Idowu, O.; Aanensen, D. M.; Okeke, I. N.

2026-04-10 microbiology 10.64898/2026.04.09.712135 medRxiv

Top 0.1%

32.6%

Show abstract

Carbapenem-resistant Gram-negative bacteria pose a critical public health threat. The role of mobile genetic elements in driving their transmission and persistence remains poorly defined. In 2022, we investigated a suspected outbreak of carbapenem-resistant Acinetobacter baumannii (CRAB) in a Nigerian adult intensive care unit (ICU), using short-read whole genome sequencing (WGS) of carbapenem-resistant clinical and environmental isolates during the cluster period. Mobile element dynamics were then inferred from hybrid assemblies of Illumina and Oxford Nanopore reads. The suspected CRAB outbreak was ruled out by WGS but a carbapenem-resistant Enterobacter hormaechei ST114 bloodstream isolate was found to be indistinguishable from two environmental isolates, all recovered during the Acinetobacter surge. Hybrid assemblies revealed a strikingly conserved [~]19 Kb resistance island shared across all ST114 genomes. The island contained a blaNDM-5 cassette alongside many other antimicrobial resistance genes, within class 1 integronns and flanked by insertions sequences, located on a 46,176 bp plasmid. Using the ST114 plasmids hybrid assembly as scaffold, the same plasmid was identified in the genome of a Klebsiella pneumoniae ST15 isolate from the ICU environment during the same period. Additionally, re-interrogation of genomic surveillance data uncovered four clonal 2020 ST109 Enterobacter bloodstream isolates from the same facility that carried the resistance genes in the same context on a large 267,242 bp plasmid. Carbapenem resistance in hospital Enterobacterales is driven by both clonal expansion and horizontal spread of mobile resistance elements. These findings underscore the need to track mobile elements alongside bacterial lineages to inform evidence-based infection control, especially in low-resource settings. Impact StatementCarbapenem resistance among Enterobacterales remains a major public health threat, yet how mobile genetic elements contribute to their persistence and spread in hospital settings is still poorly understood. In this study, we investigated a suspected outbreak of carbapenem-resistant Acinetobacter baumannii in an adult intensive care unit in Nigeria. Although the outbreak was eventually ruled out, genomic analysis has shown the importance of careful interpretation of suspected outbreak cases in hospital settings. Our findings highlight the importance of close monitoring of ICU environments, the implementation of blood culture-based diagnostics, and the value of genomic support in outbreak investigations. These findings demonstrate that carbapenem resistance in hospital Enterobacterales is driven not only by clonal expansion but also by the horizontal dissemination of a highly stable blaNDM-5-associated MDR island capable of integrating into diverse plasmid backbones. This study emphasizes the need for genomic surveillance that tracks both mobile elements and bacterial lineages to strengthen outbreak investigations, especially in low-resource settings. It further underscores the links between clinical and environmental AMR reservoirs and reinforces the value of a One Health approach to controlling carbapenem resistance. Data summaryFASTQ sequences were deposited in the NCBI BioSample database under accession numbers SAMN55915584 - SAMN55915597.

8

Genomic epidemiology and transmission dynamics of plasmids carrying New Delhi metallo-β-lactamase (blaNDM) at a single hospital system over five years

Raabe, N. J.; Mills, E. J.; Bapat, S.; Griffith, M. P.; Shutt, K.; Waggle, K. D.; Sundermann, A. J.; Shields, R. K.; Pless, L.; Snyder, G. M.; Harrison, L. H.; Van Tyne, D.

2026-05-18 infectious diseases 10.64898/2026.05.14.26353212 medRxiv

Top 0.1%

27.4%

Show abstract

Background: Conjugative plasmids encoding New Delhi metallo-beta-lactamase (blaNDM) pose a threat for the spread of carbapenem resistance among healthcare acquired pathogens. Plasmid-associated outbreaks of blaNDM-producing bacteria can involve multiple bacterial species and persist over long time periods, making their detection and control difficult. We systematically studied the genomic epidemiology of blaNDM-encoding plasmids detected within a single hospital system over a five-year period. Methods: blaNDM-producing isolates were collected from clinical cultures as part of the Enhanced Detection System for Healthcare-Associated Transmission (EDS-HAT) genomic sequencing active surveillance program, or during infection prevention and control (IP&C) investigations. Isolates were identified as blaNDM producers by polymerase chain reaction (PCR); the presence of plasmid-encoded blaNDM genes was confirmed by sequencing on both Illumina and Oxford Nanopore platforms. Plasmids were clustered using Pling and bacterial relatedness of host isolates was evaluated with split kmer analysis. Electronic health record data were used to identify shared unit-level spatiotemporal exposures and epidemiologic links within both plasmid and host clusters. Results: We identified 61 blaNDM-producing isolates collected from 54 patients sampled between November 2020 and July 2025. Isolates belonged to 15 Enterobacterales species; Enterobacter hormaechei was the most frequently sampled species (n=23, 37%), and blaNDM-5 was the most frequently observed blaNDM allele (n=36, 59%). We observed six clusters of genetically similar blaNDM-encoding plasmids each containing 2-28 isolates, and eight singleton plasmids. The two largest plasmid clusters consisted of a highly conserved 46 kb IncX3 family blaNDM-5-encoding plasmid (n=28 plasmids, 9 species) and a more variable 98-201 kb IncC family blaNDM-1-encoding plasmid (n=12 plasmids, 6 species). Epidemiologic investigation paired with whole genome sequencing identified spatiotemporal associations between shared patient exposures and putative plasmid and bacterial transmission clusters, suggesting that unit-level exposures contribute to plasmid dissemination. Finally, analysis of publicly available sequences showed that the most prevalent plasmids detected, IncX3(blaNDM-5) and IncC(blaNDM-1), also demonstrated high global prevalence. Conclusions: This study demonstrates the diversity of blaNDM carrying plasmids within a single hospital system and their capacity to cause prolonged, multispecies outbreaks. Integrating whole genome sequencing with epidemiologic data identified unit-level spatiotemporal overlap as a likely contributor to plasmid dissemination in the hospital.

9

Ecological and molecular drivers of ESBL plasmid dissemination in Enterobacteriaceae in Vietnam

Pham, P.; Quynh, P. T. M.; Nguyen, Q.; Tuyen, H. T.; To, N. T. N.; Nhi, L. T. Q.; Duong, V. T.; Lan, N. P. H.; Baker, S.; Thwaites, G.; Rabaa, M. A.; Chung The, H.; Tang, C. M.; Duy, P. T.

2026-05-08 infectious diseases 10.64898/2026.05.06.26352499 medRxiv

Top 0.1%

23.5%

Show abstract

BackgroundExtended-spectrum beta-lactamase (ESBL)-producing Enterobacteriaceae are among the WHOs highest-priority antibiotic-resistant pathogens. Plasmids are the main drivers of ESBL dissemination, yet their reservoirs and transmission dynamics remain poorly understood in low- and middle-income countries such as Vietnam, where infections caused by ESBL-producing bacteria are prevalent. MethodsHere, we characterised the genetic structure of 68 ESBL-encoding conjugative plasmids isolated from the human gut microbiome (HGM) of healthy Vietnamese children. We further examined the extent of ESBL plasmid transfer between the HGM and human disease-causing Enterobacteriaceae pathogens (including Shigella sonnei, non-typhoidal Salmonella, extraintestinal pathogenic Escherichia coli ST131), as well as E. coli isolated from animals. ResultsThe dominant plasmid Inc groups found in cephalosporin-resistant human gut bacteria were IncF, IncB/O/K/Z and IncI1, carrying mainly blaCTX-M-14, blaCTX-M-15, blaCTX-M-27 and blaCTX-M-55. These plasmids from the HGM, rather than from animal E. coli, share higher genetic similarity to plasmids in human pathogens, suggesting that human gut is the main reservoir for clinically relevant ESBL plasmids. We also found that widespread ESBL plasmid variants exhibited higher conjugation frequencies, facilitating broader geographical and host dissemination. In contrast, less mobile plasmids persisted mainly through clonal expansion of their bacterial hosts. ConclusionsThese findings highlight the central role of the human gut as a reservoir for ESBL plasmids and provide insights into the biological factors contributing to their successful spread among pathogenic Enterobacteriaceae. Our work underscores the need for targeted interventions to reduce colonization and transmission of ESBL-producing bacteria within the human gut.

10

Enteroaggregative Escherichia clade I from Nigeria

Dada, R. A.; Akinlabi, O. C.; Tytler, B. A.; Olayinka, B. O.; Page, A. J.; Thomson, N.; Okeke, I. N.

2026-04-22 microbiology 10.64898/2026.04.21.719883 medRxiv

Top 0.1%

23.0%

Show abstract

Escherichia coli, the Escherichia type species, is present in mammalian and avian intestinal microbiota, and includes both commensals and pathogens. Other Escherichia species are understudied because they are less commonly associated with human disease and because of paucity of tools that can correctly delineate them from E. coli. However, other species of this genus including Escherichia albertii and Escherichia fergusonii are repeatedly reported as diarrhoeagenic. We hypothesized that some bacteria fitting the definition of enteroaggregative E. coli (EAEC) belong to species other than E. coli. We used phylogeny to determine the species of 2,818 Escherichia genomes from diarrhoea epidemiology studies in Nigeria. Phylogeny speciation was confirmed using GTDB-tk and ClermonTyping. Virulence genes were detected using ARIBA/Virulencefinder database and multilocus sequence typing performed using the Achtman scheme. Fourteen non-coli Escherichia genomes were identified-- Escherichia clade I ST485 (11), Escherichia ruysiae ST5792 (2) and Escherichia fergusonii ST5636 (1). All the Escherichia clade I ST485 carry EAEC virulence genes aap, aar, astA and air, as well as hlyF, eatA, tsh, traT, and chuA virulence genes. Interestingly, 62% of enteroaggregative Escherichia clade I ST485 genomes listed on Enterobase are from Africa isolates, despite only 3% of genomes overall coming from the continent. Our results suggest that non-coli Escherichia species are infrequently isolated from human stool, but, when they are, they are misidentified as E. coli so that their significance is largely overlooked. Escherichia clade I ST485 is a globally disseminated enteroaggregative Escherichia clade I lineage that is common in Africa. Author SummaryEscherichia clade I are rarely associated with disease and because of the difficulty in differentiating them from Escherichia coli in routine laboratory, they are often misidentified as Escherichia coli leading to the underestimation of their impact on the burden of disease. Additionally, some clones of Escherichia clade I also carry genetic markers that have been used to define Enteroaggregative Escherichia coli (EAEC), a cause of persistent diarrhoea in developing countries and travellers diarrhoea in developed economies. EAEC has also been associated with malnutrition and poor growth among children in developing economies. We here describe clones of Escherichia clade I (ST485) that carries enteroaggregative genes and in some cases, recovered from diarrhoeal cases. We show from genomes deposited on Enterobase and our study, that this clone is globally disseminated, often associated with human infections and often misidentified as Escherichia coli. We also describe other non-coli Escherichia other than Escherichia clade I isolated from humans. We suggest that the Escherichia clade I clone carrying enteroaggregative genes may be described as Enteroaggregative Escherichia clade I.

11

Co-infections and cryptic pathogens uncovered by metatranscriptomics in New Zealands severe acute respiratory infections

Holdsworth, N.; French, R.; Waller, S.; Jelly, L.; Oneill, M.; de Vries, I.; Dubrelle, J.; French, N.; Bloomfield, M.; Winter, D.; Huang, Q. S.; Geoghegan, J. L.

2026-03-24 genomics 10.64898/2026.03.19.712874 medRxiv

Top 0.1%

22.6%

Show abstract

Severe acute respiratory infections (SARI) are a leading cause of hospitalisation and mortality globally. Many SARI cases remain undiagnosed because kit-based PCR diagnostic panels are typically limited to one or a small number of known pathogens and may fail to identify low-abundance infections or novel, poorly characterised organisms. Here, we used metatranscriptomic sequencing to profile the total infectome of 300 PCR-negative SARI nasopharyngeal samples collected through sentinel hospital-based surveillance in New Zealand between 2014-2021. Our analysis revealed actively transcribing potential pathogens in 43% of SARI cases, comprising 10 RNA viruses, three DNA viruses, nine bacterial species and four fungal species. Notably, co-infections occurred in 26% of cases, revealing polymicrobial infections missed by routine diagnostics. Human rhinoviruses were the most frequently identified, despite not being detected by PCR, and multiple common-cold coronaviruses, human parechovirus A1 and parainfluenza virus type 4, were identified, although these were not included in the PCR screening panel. We also detected a range of bacterial and fungal species and uncovered highly expressed virulence and antimicrobial resistance genes. Infectome composition and diversity were shaped by key demographic and epidemiological factors, with strongest effects observed for age and year of sample collection, indicating that host characteristics and temporal dynamics influence both microbial richness and community structure. These findings highlight the limitations of current diagnostic strategies and the value of metatranscriptomics for comprehensive microbial identification. Integrating such genomic approaches into both clinical and public health frameworks could improve diagnostic accuracy, enabling more sensitive detection and characterisation of potential pathogens while also strengthening surveillance and outbreak response.

12

PCR-free, targeted genomic sequencing using Dynamically optimized reference Adaptive Sampling (DORAS)

Borcard, L.; Gempeler, S.; Terrazos Miani, M. A.; Casanova, C.; Ramette, A.

2026-05-29 genomics 10.64898/2026.05.26.727915 medRxiv

Top 0.1%

22.5%

Show abstract

Whole genome sequencing (WGS) has become a cornerstone of clinical microbiology, enabling comprehensive analysis of microbial genome diversity. However, WGS is often computationally intensive and time-consuming when applied to specific applications like multilocus sequence typing (MLST), where only a subset of genes is only needed for typing. This study evaluates the potential of adaptive sampling (AS), a software-based solution available on Oxford Nanopore Technologies (ONT) devices, to optimize sequencing runs for MLST by reducing the production of unnecessary reads falling outside of the target areas. We demonstrate that AS, when used directly with the target gene sequences, does not reach sufficient target coverage when compared to WGS baseline sequencing due to inefficient read recruitment. Thus, we developed a novel, PCR-free approach, termed Dynamically Optimized Reference Adaptive Sampling (DORAS), which streamlines gene-specific enrichment by targeting genomic regions of interest and their genomic vicinity. DORAS first determines the genomic context of regions of interest for each sample, and then dynamically adjusts the length of the reference sequences during live sequencing. Consensus sequences are periodically constructed and evaluated for taxonomic classification. We demonstrate that full MLST profiles can be obtained in approximately half the time required for whole-genome sequencing to achieve 30X coverage (3 vs. 6 h), with no additional hands-on library preparation time. Validation on clinical isolates from hospital outbreaks belonging to Corynebacterium diphtheriae, vancomycin-resistant Enterococci, and routine clinical E. coli isolates, demonstrated the consistent retrieval of MLST types as compared to standard WGS methods. DORAS thus offers a cost-effective, efficient solution for routine surveillance and outbreak investigations based on MLST types in the clinical setting.

13

Comparative Genomics of Bovine and Human Fusobacterium necrophorum Strains Reveal Subspecies- and Host-associated Differences in Virulence and Antimicrobial Resistance

Kilama, J.; Holman, D. B.; Abbasi, M. A.; Amachawadi, R. G.; Nagaraja, T. G.; Dahlen, C. R.; Amat, S.

2026-04-23 microbiology 10.64898/2026.04.22.720273 medRxiv

Top 0.1%

22.4%

Show abstract

Fusobacterium necrophorum (FN) is an important opportunistic pathogen implicated in necrotizing infections, including liver abscesses, calf diphtheria, metritis, and foot rot in cattle, and tonsillopharyngitis in humans. However, FN also exists as a commensal member of the bovine reproductive microbiota with potential negative, as in metritis, and even positive associations with pregnancy outcomes. The genomic features that enable FN to colonize diverse hosts and anatomical niches as either a commensal or a pathogen is poorly understood. We addressed these knowledge gaps by performing comparative genomic analysis of 137 FN strains (80 newly sequenced, 57 publicly available) from clinical and non-clinical sources across human and bovine hosts. We investigated the pangenome structure, virulence gene repertoire, antimicrobial resistance genes (ARG) prevalence, as well as host-and subspecies-associated genomic signatures of two FN subspecies: subsp. necrophorum (FNN) and subsp. funduliforme (FNF). Comparative genomics revealed an open pangenome with high accessory diversity, and phylogenetic analysis separated the strains into two distinct subspecies clades. Functional profiling revealed substantial metabolic divergence between subspecies, with FNN showing higher prevalence of carbohydrate transport systems and advanced glycation-related pathways, while FNF showed enrichment in threonate metabolism and hemolysin-related systems. Virulence gene analysis identified 84 variants across multiple functional categories with subspecies- and host-specific distributions. Antimicrobial resistance genes, primarily tetracycline resistance genes [tet(O), tet(M), tet(40)] and the macrolide resistance gene erm(B), were detected in 22.6% of strains, with higher prevalence in bovine than human strains. Overall, our results suggest that pathogenic potential of FN appears to be determined by the interplay between an open pangenome, subspecies-specific metabolic and virulence repertoires, host-associated adaptation, and niche specialization. IMPORTANCEFusobacterium necrophorum, comprising two subspecies, necrophorum (FNN) and funduliforme (FNF), is a major pathogen in cattle and humans, yet it also occurs as a commensal inhabitant in healthy cattle, particularly in the rumen, hindgut, semen, and the female reproductive tract. However, emerging evidence indicates it also occurs as a non-clinical inhabitant, particularly in the bovine reproductive tract, where it may be associated with improved pregnancy outcomes. We conducted a comparative genomic analysis of 137 F. necrophorum strains, including FNN (n = 12) and FNF (n = 125), sourced from humans (n=53) and cattle (n=84) across seven anatomical niches spanning both healthy and diseased sources. We identified subspecies- and host-specific metabolic pathways, antimicrobial resistance profiles, and distinct virulence gene distributions that underpin the ecological versatility of Fusobacterium necrophorum. Overall, these findings provide a genomic framework for understanding its host adaptation and niche specialization across bovine and human hosts.

14

Environmental reservoirs of high-risk ESBL- and carbapenemase-producing E. coli and Klebsiella in maternity wards in Yaounde (Cameroon): Whole-genome sequencing and antimicrobial susceptibility studies

Bessala, G. C.; Abomo, G. D.; Ngamaleu, R.; Essiben, F.; Wheeler, N.; Buckner, M. M. C.; Kreft, J. U.; Bougnom, B. P.

2026-03-18 epidemiology 10.64898/2026.03.16.26348525 medRxiv

Top 0.1%

22.1%

Show abstract

BackgroundThe hospital environment is increasingly recognized as a critical reservoir for antimicrobial-resistant (AMR) bacteria. In sub-Saharan Africa, maternity wards represent high-risk settings where environmental contamination poses a direct threat to vulnerable mothers and neonates. Despite this, there is a significant lack of data integrating phenotypic resistance with whole-genome sequencing (WGS) to understand antimicrobial resistance (AMR) in these settings. This study characterized the AMR patterns and genomic features of ESBL-producing Escherichia coli and Klebsiella spp. isolated from maternity ward surfaces in Yaounde, Cameroon. MethodsA cross-sectional environmental study was conducted across four maternity wards. Isolates were identified via standard microbiological methods, and antimicrobial susceptibility testing against 13 antibiotics was performed following EUCAST 2024 guidelines. Short-read WGS was utilized to identify sequence types (STs), plasmid incompatibility groups, antibiotic resistance genes (ARGs), and virulence factors. Plasmid-ARG association networks were constructed to visualize resistance dynamics. ResultsNineteen ESBL-producing Enterobacterales were identified, comprising 15 E. coli and four Klebsiella isolates. High levels of multidrug resistance were observed against ciprofloxacin, penicillins, and third-generation cephalosporins. While the isolates remained sensitive to colistin and imipenem, alarming resistance to meropenem was detected. Genomic analysis revealed the presence of globally disseminated high-risk lineages, including E. coli ST131, ST1193, and ST410, alongside Klebsiella ST1324 and ST489. Critical resistance determinants, including ESBLs, AmpC enzymes, and carbapenemases (NDM and OXA-48-like), are frequently associated with epidemic plasmids such as IncF, IncA/C2, and IncL/M. Additionally, the isolates harboured virulence factors characteristic of extraintestinal pathogenic Enterobacterales. ConclusionsThe widespread presence of high-risk carbapenemase-producing clones on maternity ward surfaces identifies the hospital environment as a significant AMR reservoir in Yaounde. These findings highlight the urgent need for reinforced infection prevention and control (IPC) measures, robust antimicrobial stewardship, and the integration of genomic surveillance to safeguard highly susceptible maternal and neonatal populations from life-threatening infections.

15

Thirty years of Achromobacter ruhlandii evolution reveal pathways to epidemic lineages

Gabrielaite, M.; Johansen, H. K.; Juozapaitis, J.; Marvig, R. L.; Dudas, G.

2026-03-25 bioinformatics 10.64898/2026.03.25.714254 medRxiv

Top 0.1%

22.1%

Show abstract

BackgroundAchromobacter spp. are emerging opportunistic pathogens, associated with chronic infections, antimicrobial resistance, and poor clinical outcomes. The Danish epidemic strain (DES) of A. ruhlandii is highly drug-resistant and adapted to the cystic fibrosis (CF) airway, yet its evolutionary history and defining genomic features remain poorly understood. MethodsWe analysed genome and antibiotic susceptibility testing data for 58 longitudinally collected DES isolates sampled over 21 years at Rigshospitalet, Denmark. We combined these with 79 publicly available A. ruhlandii genomes and applied phylogenomics to infer DES emergence and transmission, and genome-wide association studies (GWAS) to identify lineage-specific and adaptive genomic features. ResultsDES forms a distinct monophyletic clade within A. ruhlandii, estimated to have emerged around 1990, with no evidence of dissemination beyond Denmark. GWAS identified key lineage-defining traits, including acquisition of large mobile genetic elements, plasmid integration events, and enrichment of resistance and iron acquisition genes. In addition, we detected other epidemic A. ruhlandii lineages with evidence of long-term persistence and inter-country spread, sharing similar genetic signatures of adaptation. ConclusionsThis study elucidates the genomic features associated with chronic infection and epidemic potential in A. ruhlandii. The DES lineage illustrates how extensive horizontal gene transfer, high intrinsic resistance potential, and enhanced host-adaptation traits, such as increased iron acquisition, can facilitate the emergence and persistence of successful epidemic lineages. These findings highlight shared evolutionary signatures of epidemic A. ruhlandii and underscore the need for continued genomic surveillance to detect and monitor emerging high-risk lineages in chronic infections.

16

Refining the Serine Protease Autotransporters of Enterobacteriaceae (SPATE) gene detection in Enteroaggregative Escherichia coli genomes uncovers differential SPATE distribution by phylogeny

Dada, R. A.; Afolayan, A. O.; Adewuyi, O. A.; Tytler, B. A.; Olayinka, B. O.; Thomson, N. R.; Okeke, I. N.

2026-04-16 microbiology 10.64898/2026.04.16.715897 medRxiv

Top 0.1%

22.1%

Show abstract

BackgroundEnteroaggregative Escherichia coli (EAEC) are a heterogenous pathotype, implicated in acute and persistent diarrhoea especially in developing countries. Serine Protease Autotransporters of Enterobacteriaceae (SPATEs) are Type V Secretory System trypsin-like proteases repeatedly reported from EAEC. This study aimed to determine SPATE encoding-gene prevalence among EAEC and their association with diarrhoea. We screened 881 EAEC genomes from four recent epidemiological studies in Nigeria for 23 SPATE-encoding genes, initially using ARIBA and the Virulencefinder database. ResultsInitial screening inflated SPATE gene content, particularly in genomes with multiple SPATEs, due to cross detection of highly similar sequences and other artefacts. We developed and validated refined methodology, which detected 478 of 1,156 original SPATE calls and also identified SPATE miscalls from previous datasets in the literature. The most prevalent SPATE-encoding gene in our EAEC collection was sepA 297(33.71%), closely followed by sat 360 (29.74%). pic, encoding a SPATE with mucinase activity, was found in 65 (7.4%) genomes and associated with diarrhoea (p=0.00004). EAEC strains belonging to E. coli phylogroups A, B1 or C carried, on average, one SPATE gene per genome while >1 was typically detected in phylogroup B2 EAEC. Other EAEC carried few or no SPATE genes. ConclusionsOur study shows that multifunctional genome analysis tools may have to be refined for certain gene families to avoid overestimation. SPATEs are not as prevalent as previously thought but they remain common among EAEC, particularly among phylogroup A, B1, B2 and C, pointing to the possibility that they make lineage-specific contributions to disease.

17

The pQBR mercury resistance plasmids: a model set of sympatric environmental mobile genetic elements

Orr, V. T.; Harrison, E.; Rivett, D. W.; Wright, R. C. T.; Hall, J. P. J.

2026-03-27 microbiology 10.64898/2026.03.27.714766 medRxiv

Top 0.1%

21.9%

Show abstract

Plasmids are extrachromosomal mobile genetic elements that can facilitate rapid bacterial adaptation by transferring genes between individuals. While plasmids are known to exist in diverse habitats and encode a range of traits, most of our knowledge about plasmids comes from clinically-associated antimicrobial resistance (AMR) plasmids that have already been recruited as vectors of drug resistance and have likely been shaped by strong selection for plasmid-encoded resistance. Here, we investigated 26 plasmids from the pQBR collection -- a set of large, co-existing mercury resistance environmental plasmids isolated in Pseudomonas spp. from a field in Oxfordshire in the 1990s -- and explored the ability of pQBR plasmids to mobilise novel chromosomally-encoded traits. New whole genome sequences for 25 plasmids confirmed that these soil-isolated plasmids are generally very large (140-588 kb), constitute at least five distinct genetic groups, and have relatives in various other Pseudomonas species and habitats. Despite significant nucleotide-level divergence, Groups I (pQBR103-like, [~]406 kb) and IV (pQBR57-like, [~]328 kb) showed remarkable ancient similarities in synteny and gene content both with one other, and with the PInc-2 family of plasmids known to mobilise clinically significant drug resistance in Pseudomonas aeruginosa. None of the pQBR plasmids sequenced to date harboured known AMR determinants, but putative phage defence systems and metal resistances were evident. Transposable elements, including the Tn5042 mercury resistance transposon, were responsible for significant structural variation within plasmid groups, consistent with a predominant role of transposons in rapidly remodelling plasmids. To experimentally test the ability of pQBR plasmids to spread new traits, we developed a novel transposon mobilisation assay which showed that certain Group IV pQBR plasmids were especially effective at acquiring the chromosomally-encoded transposon Tn6291, and that this mobilisation was likely due to specific plasmid factors rather than generic conjugation rate. Our work presents a tractable set of sequenced plasmids suitable for exploring the evolution and dynamics of gene acquisition by pre-AMR plasmids, and provides a key case study highlighting the pervasive interplay between plasmids and transposable elements that can drive microbial genome evolution. Repositories: github.com/jpjh/PQBR_PLASMIDS Impact statementPlasmids can drive microbial evolution by acting as vectors for horizontal gene transfer. Because of their central role in disseminating antimicrobial resistance (AMR), plasmids are mainly explored as vehicles for AMR traits, meaning that our knowledge of the diversity and evolutionary dynamics of non-AMR plasmids is more limited. Here, we explore sequences from a set of mercury resistance plasmids isolated in Pseudomonas spp. from pristine agricultural land that lack AMR determinants. By providing new whole genome sequencing analyses we expand the set of sequenced pQBR plasmids to 26, finding globally dispersed relatives from clinical, environmental, and industrial settings, and identifying an ancient plasmid backbone shared amongst divergent modern environmental and clinical AMR plasmids. We experimentally verify the role of pQBR plasmids in readily mobilising chromosomal traits using a novel transposon mobilisation assay, which suggests that specific plasmid-transposon interactions may drive trait spread. Overall, our work expands our understanding of the role of environmental plasmids in mobilising and disseminating adaptive traits.

18

Genomic characterization of Escherichia coli and Enterobacter hormaechei clinical isolates from a tertiary healthcare facility in Kenya

Musundi, S.; Kimani, R. W.; Waweru, H. K.; Wakaba, P.; Mbogo, D.; Essuman, S.; Onyambu, F.; Kanoi, B. N.; Gitaka, J.

2026-04-15 bioinformatics 10.64898/2026.04.13.718279 medRxiv

Top 0.1%

19.3%

Show abstract

Extended-spectrum beta-lactamase-producing Enterobacterales such as Escherichia coli and Enterobacter hormaechei represent a growing public health challenge in clinical settings, particularly in low-and middle-income countries, due to the escalating threat of antimicrobial resistance (AMR). In this study, we aimed to identify the antibiotic resistance genes present in E. coli (n=4) and E. hormaechei (n=3) clinical isolates. Multidrug-resistant phenotypes were confirmed using disc diffusion assays against 20 antibiotics. Whole-genome sequencing of resistant isolates was performed using Oxford Nanopore Technologies. Genome assembly and analysis revealed high-risk clones, including sequence type (ST) 1193 in E. coli and ST78 in E. hormaechei. All E. coli isolates harbored the blaCTX-M gene in their chromosomes along with point mutations conferring resistance to fluoroquinolones, while E. hormaechei isolates encoded blaACT in their chromosomes. Additionally, both species carried plasmids with multiple antibiotic resistance genes, including blaOXA and blaTEM, co-located with metal resistance operons, indicating the potential for horizontal gene transfer. BLAST analysis revealed high sequence similarity between the plasmids identified in clinical isolates and those previously recovered from environmental sources, highlighting the role of environmental reservoirs in AMR dissemination. Notably, no carbapenem resistance genes were detected in any isolate. These findings underscore the growing threat posed by multidrug-resistant Enterobacterales in clinical settings and emphasize the urgent need for strengthened infection prevention and control measures to mitigate AMR spread.

19

Genomic epidemiology of the 2017-2023 outbreak of Mycoplasma bovis sequence type ST21 in New Zealand

French, N. P.; Burroughs, A.; Binney, B.; Bloomfield, S.; Firestone, S. M.; Foxwell, J.; Gias, E.; Sawford, K.; van Andel, M.; Welch, D.; Biggs, P. J.

2026-04-10 genomics 10.64898/2026.04.07.717125 medRxiv

Top 0.1%

19.0%

Show abstract

Mycoplasma bovis was first detected in cattle in New Zealand in 2017, prompting an eradication programme that incorporated extensive surveillance and a test-and-cull policy. Genome sequence data and phylodynamic models were used to inform decision making throughout the eradication programme. Isolates from 697 cattle on 126 farms were collected and sequenced between July 2017 and December 2023. Phylodynamic models were used to estimate the time of most recent common ancestor, the effective reproduction number (Reff) and effective population size, and long-range and local between-farm transmission dynamics. The analysis revealed the dramatic impact of movement restrictions and culling up to early 2020, with a sharp reduction in the Reff to less than 1 in 2018/9 and the extinction of two of three major lineages in 2020. This was followed by three-years of residual infection in farms in the South Island, associated with persistent infection of a large feedlot farm and nearby farms. The comprehensive dataset of genomic and epidemiological data provided a unique opportunity to study the dynamics of a country-wide outbreak of a single-host pathogen from first detection to potential eradication, underlining the utility of integrated genomic surveillance during an outbreak response. Author summaryThe economically important cattle pathogen, Mycoplasma bovis, was first detected in New Zealand in 2017. This led to a large-scale, successful control programme aimed at eradication of the pathogen. The decision to undertake an eradication programme was informed by initial analyses of whole genome sequences from isolates collected as part of the surveillance programme. The analysis showed that the bacteria had entered New Zealand relatively recently and was unlikely to be widespread. Over the subsequent years, genome sequencing and modelling of transmission dynamics informed important policy decisions made by the New Zealand Government and the cattle industry, and helped to monitor progress of the eradication programme. The impact of the detection, movement control and culling programme was profound, with sharp reductions in transmission between 2018 and 2020. This was followed by a long tail of localised infection in the South Island, involving transmission from a large feedlot farm. Provisional eradication was achieved after depopulation of this feedlot. This analysis highlights the role of genomic surveillance and modelling to inform decision making during an infectious disease outbreak.

20

Temporal dynamics and acquisition of Shiga toxin subtype stx2a within Shiga toxin-producing Escherichia coli in England, 2016 to 2024

Hayles, E. H.; Rodwell, E. V.; Greig, D. R.; Jenkins, C.; Langridge, G. C.

2026-04-12 genetics 10.64898/2026.04.09.717390 medRxiv

Top 0.1%

18.9%

Show abstract

Shiga toxin-producing Escherichia coli (STEC) are an important public health concern due to their association with foodborne gastroenteritis and severe outcomes including haemolytic uraemic syndrome (HUS), particularly linked to the stx2a subtype of the Shiga toxin. We investigated the temporal dynamics and acquisition of stx2a among STEC isolates submitted to the United Kingdom Health Security Agency (UKHSA) between 2016 and 2024. 12,888 whole genome STEC sequences and associated metadata were analysed. 31.9% of STEC isolates harboured stx2a, spanning 78 O serogroups with a marked shift from STEC O157 to non-O157 serogroups over time. STEC O26:H11 and STEC O145:H28 were the primary drivers of observed increases, most commonly associated with stx2a alone or in combination with stx1a. The widespread and increasing presence of stx2a across the STEC population in England highlights an emerging public health risk and demonstrates the value of routine genomic surveillance in monitoring high-severity Shiga toxin subtypes.